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Abstract 



This paper is concerned with density estimation of directional data 
on the sphere. We introduce a procedure based on thresholding on a 
2 ' new type of spherical wavelets called needlets. We establish a minimax 

result and prove its optimality. We are motivated by astrophysical 
applications, in particular in connection with the analysis of ultra high 
energy cosmic rays. 
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oo ! 1 Introduction 

O 

We consider the problem of estimating the density / of an independent sam- 
. , - pie of points Xi, . . . , Xn observed on the d-dimensional sphere S"* of M'*+^. 

r> ' Obviously, the most immediate examples of applications appear in the case 

jrt ' d — 2. However, no major differences arise from considering the general case. 

There is an abundant literature about this type of problems. In partic- 
ular, minimax L^ results have been obtained (see |Kle99j . [Kle03| ). These 
procedures are generally obtained using either kernel methods (but in this 
case the manifold structure of the sphere is not well taken into account), or 
using orthogonal series methods associated with spherical harmonics (and 
in this case the 'local performances of the estimator are quite poor, since 
spherical harmonics are spread all over the sphere). 

In our approach we focus on two important points. We aim at a procedure 
of estimation which is efficient from a L^ point of view (as it is a tradition 
in statistics to evaluate the procedure with the mean square error). On the 
other hand, we would like it to perform satisfactorily also from a local point 
of view (in infinity norm, for instance). To have these two requirements 
together seems to us a warrant to have good results in practice. In effect, it 
is very difficult to produce a loss function which reflects at the same time the 



requirement of clearly seeing the bumps of the density, of being able to well 
estimate different level sets, of testing whether there is a difference between 
the northern and southern hemispheres and so on. 

In addition, we require this procedure to be simple to implement, as well 
as adaptive to inhomogeneous smoothness. 

This type of requirements is generally well handled using thresholding es- 
timates associated to wavelets. The problem requires a special construction 
adapted to the sphere, since usual tensorized wavelets will never reflect the 
manifold structure of the sphere and will necessarily create unwanted arti- 
facts. Recently in ( |NP W06bj . [NPW06aj ) a tight frame (i.e. a redundant 
family) was produced which enjoys enough properties to be successfully used 
for density estimation. 

The fundamental properties of wavelets are their concentration in the 
Fourier domain as well as in the space domain. Here, obviously the 'space' 
domain is the sphere itself whereas the Fourier domain is now obtained by 
replacing the 'Fourier' basis by the basis of Spherical Harmonics which plays 
an analogous role on the sphere. 

The construction |NP W06b] . |NP W06a| produces a family of functions 
which very much resemble to wavelets, the needlets, and in particular have 
very good concentration properties. 

We use these needlets to construct an estimation procedure, and prove 
that this procedure attains optimal rates over various spaces of regularity. 

Again, the problem of choosing appropriated spaces of regularity on the 
sphere in a serious question, and we decided to consider the spaces which 
may be the closest to our natural intuition: those which generalize to the 
sphere case the classical Holder spaces. 

In the first section we present ( |NPW06"b] ) needlets, and describe spaces 
of regularity on the sphere. In the second one we define our estimation 
procedure, and describe its properties. 

The novelties of this paper lie in the application of thresholding to the 
needlet coefficients, which gives a very simple and adaptive procedure which 
works on the sphere. We also focus here on giving the results in L^o norm, 
and obtain the rates of convergence for many other loss functions as a con- 
sequence of the previous ones. 

Our results are motivated by many recent developments in the area of 
observational astrophysics. As an example, we refer to experiments mea- 
suring incoming directions of Ultra High Energy Cosmic Rays, such as the 
AUGER Observatory (http://www.auger.org). Here, efficient estimation 
of the density function of these directional data may yield crucial insights 
into the physical mechanisms generating the observations. More precisely, a 
uniform density would suggest the High Energy Cosmic Rays are generated 
by cosmological effects, such as the decay of massive particles generated dur- 
ing the Big Bang; on the other hand, if these Cosmic Rays are generated 
by astrophysical phenomena (such as acceleration into Active Galactic Nu- 
clei) , then we should observe a density function which is highly non- uniform 
and tightly correlated with the local distribution of nearby Galaxies. Mas- 
sive amount of data in this area are expected to be available in the next 
few years. The Auger observatory will be based on two arrays of detectors; 
the first one covers an area larger than 3000 Km^ in Pampa Amarilla (Ar- 



gentina), and has already started to collect observations: some preliminary 
evidence was provided in [Col08| . and a non- uniform distribution seems to 
be favored. The whole celestial sphere will actually be covered only when 
the construction of the northern hemisphere array, due to be built in eastern 
Colorado, will be completed, a few years from now. Hence, in the imme- 
diate future efficient statistical techniques will be eagerly requested for the 
analysis of the forthcoming datasets. 

A survey of statistical methodologies dealing with directional data on the 
sphere may be found in [Mar72| , |Jup95| , |MJOO| . The generalization of esti- 
mation using orthogonal series methods to the case of compact Riemannian 
manifold can be found in |Hen03| . See related works in |HK96j . |Ruy89| , 
|HJR93| ■ [PelOSj . | Jup08| . Kernel methods on the sphere have been investi- 
gated in [HWC87| . Minimax rates for the equivalent of Sobolev spaces on 
the sphere associated can be found in |Kle99j . [KleOOj . [Kle03| . 

The plan of the paper is as follows. In §2 and §3 we review some back- 
ground material on needlets and Besov spaces. §4 introduces our threshold- 
ing estimator, whose minimax performances are stated in §5. §6 shows the 
performance of the estimators on some simulated data. §7-§9 contain the 
proofs. 

2 Needlets 

This construction is due to Narcowich, Petrushev and Ward [NP W06b] . 
Its aim is essentially to build a very well localized tight frame constructed 
using spherical harmonics, as discussed below. It was recently extended 
with fruitful statistical applications to more general Euclidean settings (see 
[KPPWOTj) and already exploited for estimation and testing problems in 
|BKMP06j . |BKMP07j . 

Let us denote by S'^, the unit sphere of M'^+-^. We denote dx the surface 
measure of S'^, that is the unique positive measure on S'^ which is invari- 
ant by rotation and has total mass ujd = 27r('^+^)/^/r(^^). The following 
decomposition is well known. 

oo 

L'{dx) = ^Jif{, (1) 

;=o 

where Jifi is the restriction to W^ of the homogeneous polynomials on M'^+^ 
of degree / which are harmonic (i.e. AP = 0, where A is the Laplacian 
on R'^"'"^). This space is called the space of spherical harmonics of degree 
I (see |SW| . chap. 4, [VMK88J chap. 5). Its dimension is equal to gi^d = 
{~d) ^ {^ cT ) ^^d is therefore of order l'^^^. The orthogonal projector on 
Jifi is given by the kernel operator 

V/ e L\dx), P^J{x) = / Li{{x,y))f{y)dy (2) 

where {x,y) is the standard scalar product of M^+^, and Li is the Gegen- 
bauer polynomial with parameter ^^ of degree I, defined on [— 1,-|-1] and 



normalized so that 









l,k 



Ar{d)7rd+^ '''' ■ 

For the main situation of interest, d — 2, the right hand side above is equal 
to ^i^. Recall that if d = 2, the usual normalization of the Leeendre 
polynomial {Li{l) — 1) gives the square of their L^ norm equal to ^ixT' 
Therefore these must be multiplied by {21 + l)/(47r), in order to satisfy ([3]). 
Let us point out the following reproducing property of the projection 
operators: 

/ Li{{x,y))Lk{{y,z))dy = Si,kLi{{x,z)) . (4) 

The construction of needlets is based on the classical Littlewood-Paley de- 
composition and a subsequent discretization. 

Let (fihe a. C°° function on R, symmetric and decreasing on R+ supported 
in 1^1 < 1, such that 1 > ip{^) > and ip{^) = 1 if |^| < i. We set 

so that 

V|CI>1, Y.^M) = l. (5) 

Remark that 6(^) ^ only if ^ < |^| < 2. Let us now define the operator 
Aj = X]i>o b'^i^)^i ^"^^ ^^^ associated kernel 

A,(x,y) = ^62(i)L,((x,y))= ^ b\ij)Li{{x,y)) . 

l>0 23-i<;<23 + i 

The following proposition is obvious: 
Proposition 1. For every f E L^ 

J 
f= lun Lo{f) + J2^jif)- (6) 

Moreover, if Mj{x,y) =Y.i>oKh)^i^^^^y))' ^^^"^ 

A,{x, y)^ j Mj{x, z)M,iz, y) dz . (7) 

Let 

/ 

the space of the restrictions to W^ of the polynomials of degree < I. The 
following quadrature formula is true: for alH G N there exists a finite subset 



^ C S"* and positive real numbers A,, > 0, indexed by the elements rj G ^i, 
such that 

V/e^,, / fix)dx^ V A,/(77). (8) 

Then the operator Mj defined in the subsection above is such that: z i-^ 
Mj{x,z) € <^[2J+i], so that 

z K^ Mj{x,z)Mj{z,y) e ^[2J+2] , 

and we can write: 

Aj{x,y)= J M,{x,z)M,{z,y)dz= ^ \M,{x,Tj)M,irj,y) . 

This implies: 

Ajf{x)= Aj{x,y)f{y)dy^ ^ X^Mj{x,7])Mj{T],y)f{y)dy 

= Y, V\Mj i^^ V) J V^M^ (y, v)f{y)dy . 

We denote 

^[23+2] = ^j, '4'j,r,{x) := y/\ Mj{x,r]) for rj € ^j . 

The choice of the sets ^j of cubature points is not unique, but one can 
impose the conditions 

i 2* < #j; < c2*', i2~* < A^ < c2-* (9) 

for some c > 0. Actually in the simulations of fj6]we make use of some sets of 
cubature points for d = 2 such that #i^- — 2^-'+^ exactly (the corresponding 
weights being however not identical). We have, using ([6|) 

/ = Lo{f)+J2 E (/'V',:r,)L2(s.)^,-, . (10) 

The main result of Narcowich, Petrushev and Ward, |NPW06b] is the fol- 
lowing localization property of the ipj^n, called needlets: for any k there exists 
a constant ct such that, for every ^ S S'': 

Cfc2-''''/2 

"^-"(^^l - (i + 2.'^/M^,c))^- • ^''^ 

where d is the natural geodesic distance on the sphere (for d — 2, d{^, -q) = 
arccos(?7,^) ). In other words needlets are almost exponentially localized 
around any cubature point, which motivates their name. From this localiza- 
tion property it follows (see |NPW06b] ) that for 1 < p < +oo there exists 
positive constants Cp, Cp such that 

Cp2^'i(^-i) < II^^.JIp < Cp2''^(^-^) . (12) 

Also the following holds 




Figure 1: The value of the needlet i/Jj ^ as a function of the distance from the cubature 
point 5 for J = 3 (dots) and J = 4 (solid). This emphasizes the localization properties of 
needlets as j increases. 



Lemma 2. 1) For every < p < 






i/p 



(13) 



2) For every 1 < p < +oo 



i/p 






(14) 



Proof. Let us prove (|13p for p ~ +oo. Using (fTTj) and Lemma 6 of [BKMP06] 

sup E -^cV^J.C - ™P \h\ s'^P E I^J.?(^)I - 



£c6S<i 



jeiT, 



< sup |A^|c3 sup V 7- 






{eiT, 



'i- / . v,/o ,/> ^TT < ci2-'''^/2 sup lAfl 



If 1 < p < +00, by Holder inequality, if ^ + ^ = 1 so that ^ = p — \, 



p-i 



<C32-'-i(P-i) ^ |Ad^|^,,c(x)| 



< 



< 



where the last inequality comes again from (jlip and Lemma 6 of |BKMP06] . 
Now integrating and using (fT2|) for p = 1, 

II E Ae^,,a^)f <2^'*^"''^ E lAcniv-.^dii <2^'^(''-'^ E 1^?!'' 



from which (J13p follows. The remaining case < p < 1 follows immediately 
by subadditivity, as 



"■' ■ ' "IIP 

lip 



E^«V',.C(^) '< E Ikl'U.A^W: 



As for 2) clearly if p = +oo 

C2^''*/2 sup |(/,^,,c)| < 02^"/' sup f\f{x)\\^,,^{x)\dx < 

<C2-"^/'||/||o,sup|j^,-4l|i<C'||/|U 
and if p = 1 

^2-=''/^ f\f{x)\Y^ \^,,^ix)\dx<C\\f\W. 

Let now 1 < p < oo 

^ |(/,(p,,^)r2^'^(^/2-i) < 2^'^(^/2-i) ^ n\fix)\\^,,^ix)\dx 

But, by Holder inequality, for p' such that - + - =1, then^ = p — 1 and 



\fix)\\ip,^d^)\dx) ^(y |/(:r)||^,,5(x)|i/P|(p,,5(x)|i/p'dx) < 



< 

So 

^ |(/,<^^.^)|P||2Mp/2-i)<2-^-i(p-i)2Mp/2-i)^ j\f{x)Y'\^,.,^{x)\dx^ 

= 2-^'^' f\fixWj2 \'P,A^)\dx<C\\f\\;. 

Example 3. Relation (fT2|) for p = 2 states that the L^ norm of t/jj^ is 
bounded with respect to j and also bounded away from from below. As- 
sume d = 2. Then using ^ it is actually easy to see that, keeping in mind 
that Liil) - ^, 

l>0 l>0 

Assuming that the cubature points are of cardinality 2^^+^ and that they 
sum up to 47r, A^ ~ 47r • 2^^^^^ as j —> cxd. If the previous relation were an 
equality we could recognize in the right hand term the Riemann sum 






that converges, as j — > oo, to the integral 



1 f^ 
I = - tb^{t)dt 

8 Jl/2 



/2 

which depends on the choice of the function b. This L^ norm shaU appear 
frequently in the sequel. For instance, if we write down the development 
(fTO|) or the function / = V'ioioi then the coefficient /3j„i^g = (/, V'j,??) would 
be exactly equal to HV'iclli- ^s it is clear that it would be desirable for this 
coefficient to be as large as possible, the value of the integral above can be 
seen as a measure of the localization properties of the system of needlets and 
can be used as a criterion of goodness of the choice of the function b. With 
the choice we made (see ^ the quantity / above is ~ 0.107. 



3 Besov spaces on the sphere and needlets 

In this section we summarize the main properties of Besov spaces and need- 
lets, as established in |NPW06b| . 

Let / : S'' ^ M a measurable function. We define 

i?fc(/,r)- inf II/-PII. 

the infimum of the distances in L^ of / from the polynomials of degree k. 
Then the Besov space Bf. is defined as the space of functions such that 



1 / 
/eL'-and(^(fc^i?fc(/,r))«i) ' < +cx) . 



Remarking that k — )■ Ek{f,r)) is decreasing, by a standard condensation 
argument this is equivalent to 



1 / 
feL^ and (5](2^^i?2. (/, r))") ' < +c 



Theorem 4. Let 1 < r < +oo, s > 0, < g < +oo. Let f a measurable 
function and define 

provided the integrals exists. Then f d B^ if and only if, for every j — 
1,2,..., 

where {Sj)j E £q. 
As 



for some positive constants c, C, the Besov space B^ turns out to be a 
Banach space associate to the norm 

||/||b,„ :- \\i2^^'+'^^-'^^m,),e^M)j>o\k < ^ ■ (15) 

In the sequel we shall denote by B^ JM) the ball of radius M of the Besov 
space B^g. 

Theorem 5. (The Besov embedding) If p < r < oo then B^ C Bp . If 



r 
Proof By hypothesis 

Let p < r < oo, then 

i/p 



2^<i-.) y: i/3,,«r 



= 2^-/^(2--card(ir,))^/^(^:^ E l/^.d'')''^< 

<2^-/^(2--card(^,))^/^(^:^ E 1/^.df '^ = 

^2^'^(^-^)(2-^'^card(ir,))^-^( ^ l/3.,?r)'^'' < 

<C2^-^(i-^)(E |/3,,d'')'^''<C'5.2-^-«. 
On the other hand, if r < p < oo, 

= 2^'^(^-i)2^-'^(^-^)(E i/3.,?r)'^''< 

4 Needlet estimation of a density on the sphe- 
re 

Let us suppose that we observe Xi, . . . , X„, i.i.d. random variables taking 
values on the sphere having common density / with respect to dx. / can be 
decomposed using the frame of necdlets described above. 



•^ = ^dT + im^J''^J" • 






The needlet estimator is based on hard thresholding of a needlet expansion 
as follows. We start by letting: 

1 " 

/3.V=-E^^-'7(^») (16) 

1 ^ 

/^=^^+E E Pjv^^vh\p„\>.c,^) ■ (17) 



The tuning parameters of the needlet estimator are: 

• The range J = J{n) of resolution levels (frequencies) where the ap- 
proximation (|17p is used: 

A„ = {(j,77), 0<j< J, 77e iTj}, 

We shall see that the choice 2'' = (j^^^) "* is appropriate. 

• The threshold constant k. Evaluations of k are given in the following 
Section and also discussed in ^ 

• c„: is a sample size-dependent scaling factor. We shall see that an 

appropriate choice is 

/logn\ 1/2 

Example 6. In order to give a better intuition about the localization and 
near absence of correlation of the needlet coefficients, let us consider the case 
of a sample Xi, . . . , X„ of i.i.d. r.v.'s uniform on the sphere S^ of R^. Then 
the distribution of the r.v. {x,Xi) is uniform on the interval [—1, 1] and, if 
prij , /3{j are the corresponding needlet coefRcients associated to the cubature 
points 77,^ then of course they are centered r.v.'s and, thanks to Q, their 
covariance is equal to 

l>0 

Setting ?7 = ^ we find 



Var(/3,,) = 11^,^ 



which is a quantity already discussed in Example [3l As for the correlation 
between coefficients, it is given by the function ^ X^ J2i>o b'^{^)Li{cos6), 
whose graph, for some values of j is plotted in Figure [H 

Remark 7. Whereas coefficients associated to cubature points that are not 
too close are only slightly correlated, the random needlet coefficients /3j,^, 
r] € 3fj are not independent and they even satisfy the linear relation 



This comes from the fact that, as y ^ Li{{y,x)) for I < 2-' is a polynomial 
of degree < 2^^ , one has 

E A,L,((77,x))= / Lii{y,x))dy^O. (18) 

10 



0.15 



0.10 



0.05 



-0.05 




Figure 2: The decay of the covariance of /3j j and /3j ,j as a function of the distance 
between the cubature points ^ and rj for j = 3 (dots) and J = 4 (solid) (case of a uniformly 
distributed sample). 



Therefore 



E Va;^,,. = ^ee^(^) E ^rM{v-x^))^o 

rie3fj i=\ l>0 rieST-j 



Relation P^ also implies that, for a given square integrable function / on 
^^ E^GiT, V%(3j,v = 0- Actually 



E V^(3^^n^J2^ili) I E \Li{{il,x))f{x)dx . 
rieS'j i>o •'^'' -neSfj 

=0 

5 Minimax rates for W norms and Besov spa- 
ces on the sphere 

We describe the performances of the procedure by the following theorem. 

Remark that the condition s > ^ implies / e B'^ C Boo.g so that / 
is continuous. By E/ we denote the expectation taken with respect to a 
probability with respect to which the r.v.'s (X„)„ are i.i.d. with common 
density /. 

Theorem 8. For < r < cx), p > 1, s > - we have 

a) For any z > 1, there exist some constants Coo = Coo(s,P, r, A, M) such 
that if K> ^, 



sup E^||/-/||^<coo(lognr 
/es= rj\/) 



logn 



2(=-<i(i-i) 



(19) 



b) For 1 < p < oo there exist some constant Cp = Cp{s,r,p, A, M) such that 



11 



V 1^ > To; 



sup E;||/-/|lP<Cp(lognr- 
where ap = p - 1 + l^^^^ij^j, if r < 2^> whe 



logn 



(s-d(J;-i))p 
'2(s-d(i-i)) 



(20) 



sup Ef\\f-f\\;<Cp{logn)P-^ 



logn 



2 = + d flp 



2s + d 



(21) 
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Figure 3: Type of behaviour of the niinimax estimator as a function of s, r and p. The 
region with the large dots is the one corresponding to the sparse case of l|20|l . The region 
with the small dots corresponds to the regular case so that II21I I holds (see also Remark 
1101 It should be noticed that if p < 2 then the slope of the straight line starting at — is 
smaller than the other one, so that the sparse region is empty. 



Remark 9. Usually the case (pil)) is referred to as the sparse case, whereas 
(|2ip is the regular case. Remark that if p < 2, then we are always in the 
regular case (see also Figure [3]) . 

Remark 10. A closer look to the proof shows that, in the regular case, if 
we assume ||/||oo < -^^, then we can drop the restriction s > ^ without any 

modification if 1 < p < 2. In the case p > 2, using an additional modification 

p 

allowing J to depend also in p, 2"' = {'uTn) '^'^ ^'^ ^^ precise, we obtain the 
same rate under the same conditions as in the lower bound (up to logarithmic 
terms) . 

Theorem 11. (Lower hound) a) If 1 < p < 2, 



sup E/(||/ -/||P)> en- — 

/6Sj;,(M) 



bjlf2<p< +00 



{en 2s+d 
en 



2(s + d(^-J:)) 



^/^<^<pi(i-i) 



12 




Figure 4: The function b. 

Remark 12. As already remarked, up to logarithmic terms, the rates ob- 
served are minimax. It is known that in this kind of estimation, full adapta- 
tion yields unavoidable extra logarithmic terms. The rates of the logarithmic 
terms obtained in Theorem [8] are suboptimal (for instance, for obvious rea- 
son the case p ~ 2 yields much less logarithmic terms). We have focused 
on a simple proof giving all the results in a rather clear and readable way. 
However, using a more intricate proof, the rates could be improved up to be 
comparable with those in [DJKP96] . 

6 Simulations 

In this section we produce the result of two numerical experiments on the 
sphere §^. In both of them the major question concerns the choice of the 
values of J and k. Actually in practical (finite sample) situations the values 
given in Theorem [8] should be considered just as a reasonable hint. The sets 
of cubature points in the simulations that follow have been taken from the 
web site of R. Womersley http://web.maths.unsw.edu.au/~rsw. 

We realized the function ip of fJH by connecting the levels and 1 with 
a function that is the primitive, suitably rescaled of the function x -^ 
set to be equal to outside [—1,1]. The shape of the result- 



-(1- 



ing function b is given in Figure [H For this choice of 6, we have 



1 r^ 

- / i62(i)di~ 0.107 

8 Jl/2 



which, as remarked above gives an indication about the square of the value 
of the L^ norm of a needlet ipj^ . In both the examples below we considered 
samples of cardinality n = 2000 and n = 8000. The hint for the value of J of 
Theorem|8]is J = -^ log2 ( i^^) , which gives the values J ~ 4.02 and J ~ 4.9 
respectively. One should keep in mind that at a given level j it is necessary 
to have enough cubature points in order to integrate exactly all polynomials 
up to the degree 2(2^+^ — 1) = 2^+^ — 2, which means ^ 2^^+^* cubature 
points with Womersley's set (recall that on the sphere the polynomials of 
degree d form a vector space of dimension {2d + 1)^). This gives 2^° = 1024 
cubature points for j = 3, 2^2 = 4096 for j = 4 and 2" = 16384 for j = 5. 
To avoid to have more coefficients than observations, we decided to set J = 3 
for n = 2000 and J = 4 for n = 8000. 



13 





j = o 


J = l 


J = 2 


J -3 


fco = l 


8 (.89) 


29 (.45) 


96 (.38) 


471 (.46) 


fco = 1.5 


7 (.78) 


16 (.25) 


45 (.18) 


264 (.26) 


fco = 2 


4 (.44) 


4 (.06) 


29 (.11) 


126 (.12) 



Table 1 : number of coefficients surviving thresholding for various values of 
fco, n = 2000. 





J=0 


J = l 


J = 2 


i = 3 


J =4 


fco = l 


4 (.44) 


28 (.44) 


96 (.38) 


413 (.40) 


1610 (.39) 


fco = 1.5 


2 (.22) 


12 (.19) 


50 (.20) 


207 (.20) 


921 (.20) 


fco = 2 


1 (.11) 


4 (.06) 


16 (.06) 


97 (.09) 


368 (.09) 



Table 2: number of coefficients surviving thresholding for various values of 
fco, n = 8000. 



As for the value of k, we shall give the result with k = ko\/0.107 M, where 
M is an a bound for ||/||oo, trying different values of fco. Recall that this 

means that the threshold kills all coefficients (3j^ such that \l3j^\ < k 



log n 



Example 13. / = ^, the uniform density. In this case in the development 
(fT0|) it holds /3j{ = (/, V'jc)i^ = f°'' every j and S,. Therefore a first 
simple way of assessing the performance of the procedure is to count the 
number of coefhcients that survive thresholding. Of course in this case a good 
estimate is such that the coefficients Pj^^ fall below the threshold. Taking 
into account Lemma [5] the square root of the sum of the squares of the 
coefficients surviving thresholding gives an estimate of \\f — f\\2- Therefore 
a measure of the goodness of the fit is obtained by taking the sum of their 
squares. Tables [Hand [2] give the number of surviving coefficients for different 
values of the constant fco. In order to kill all the coefficients one should 
choose fco = 5.4 for n = 2000 and fco — 2.8 for n = 8000. The estimate of 
the L^ norm of the difference between / and / by taking the square root of 
the sum of the squares of the coefficients is 



fco 


1 


1.5 


2 


n = 2000 


0.146 


0.131 


0.107 


n = 8000 


0.108 


0.0834 


0.060 



Example 14. Let us consider a mixture / of two densities of the form 
fi{x) — Ciexp(— fcija; — x^p), i — 1,2, for fci = .7 and fc2 = 2 and with 
weights 0.65 and 0.35 respectively. Here the centers Xi of the two bell- 
shaped densities were taken to be xi = (0, 1, 0), X2 = (0, —.8, .6). With these 
choices it turns out that ||/||oo = 0.26. The graph of / in the coordinates 
{ip,0) {(f =longitude, =:colatitude) is given in Figure [5l 

The estimator / obtained with the choice fco — 1.1 has the graph of 
Figure [SI If one chooses fco — 1.5 the graph becomes the one of Figure [7) 
At a closer inspection it turns out that with this value of fc all coefficients at 
level j = 3 do not pass the thresholding. It looks very much like the graph of 
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Figure 5: Graph of the density with two bumps. 



N3,kO-1,n-2000 




Figure 6: Graph of the estimated density (n = 2000, fco = 1.1) 



/, even though some differences in shape are apparent. An estimate of the 
norm ||/ — /||oo computed on a grid gives ||/ — /||oo ~ 0.054. We repeated the 
simulation with n — 8000 observations. The results are reported in Figures 
[S] and [HI and are to be considered rather satisfactory. It should be stressed 
that a very limited number of coefficients passes thresholding at a frequency 
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-2,ko-1,n-2000 




Figure 7: Graph of the estimated density (n = 2000, fco = 1.65). 

j > 2. This behaviour is expected: / being very regular, it belongs to a 
space B^ ^ and thus its needlet coefficients decay very rapidly. 

j=4,ko=1 1,n=8000 




Figure 8: Graph of the estimated density {n = 8000, fco = 1-1) 
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j=4,ko-1.B5,n-aooo 




Figure 9: Graph of the estimated density (n = 8000, fcp = 1.65). It is rather 
satisfactory, but for a small dent on the top of the lowest bump. With this 
value of fcp only 2 coefficients pass thresholding for j = 4 and no one at level 
j = 3. Now 11/ -/lU^ 0.028 



7 Proof of Theorem [8] 

In the sequel we note t[l3j^^) = /3j>, 1 n a . |> ^ f, i , so that the needlet estimator 

dm) is 

1 -^ 

In this section and in the next one the density / is fixed and we shall write 
E instead of E/, as there is no danger of confusion. 

The following proposition collects the main estimates needed in the proof. 

Proposition 15. Let Ji < J be such that, for all Ji < j < J, \Pjri\ < f ^n 

(possibly Ji — J ; obviously, when f belongs to a Besov class, Ji depends on 
the "regularity" s). Then for any 7 > 0, s > 0, 2 > 1, we have 
^;,/«> 1(2+1) 



^2''^E[sup|t(/3. 



-'JVl 



P. 



3V\ 



< 



J=0 



<C 



2-^1-^(71 + lyn-i + V 2^^' sup |/3j, 



(22) 



■ n 2 
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^)'f^>^ 



and 



<C 



j=0 ri 

3=0 rieS-j 

]=0 rieSfj 



J 

j=Ji+i ve3^i 



(23) 



(24) 



We delay the proof of Proposition [15] to fj8] and derive from it the proof 
of Theorem [51 In this proof C wiU denote an absolute constant which may 
change from line to line. Let us now prove that Proposition 1151 vields to the 
statements of Theorem [SJ 

Let us prove the L°° upper bound pop , first under the condition q ~ r ^ 

CX) 



E|i/-/I1L< 
J 

^H E E (^(/3,c) - /3,,)^,, ' + II E E Pov^^^ 

L II ' * ' ' CX3 II ' ' 



J=Or,e iTj 



j>Ji,eir,- 



(25) 



■.^I + II 



The term // is easy to analyze: as / belongs to B^ ^ (M) , we have using 
(O and[ll 



,>jve^, 



E E /^i'j^j'; - E E /^j'f'^j') - <^E ™p i/3j';iiiV'j>,iioo < 

j>Jrieafj j>J vesfj 

<C^2-^[^+^l2^<(-^°S'''' 

Then we only need to remark that | > ^jxjj for s > 0. 

As for /, using the triangular inequality together with Holder inequality, 
then ([T2|), and ^ with 7 = ^, z > 1, we get 



/ < C(J + ly-^ y 2'^E sup |i(/3;g - /5j^|" < 

J 

<C(J + l)^-i[2'^i'^(Ji + l)^n-t +C Y, 2"^^ sup\pjn\" + n-^] 

3 = Jl+l '"^ 



veSf, 
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As / belongs to B^ .^(M), \l3j,,\ < Af2--'('*+i) and we can see that 

2,7, 



K 

2M 



logn 



is adequate, it is easy to conclude. 

For arbitrary q, r (J19p is now easy to deduce from the previous compu- 

tation by the Besov embedding (Theorem[5|) B^^^{M) C Boo,So{M). Let us 
prove (|2T]l . that is the regular case. We observe first that since B'^^^{M) C 
Bp „{M) for r > p, this case will be assimilated to the case p = r, and from 
now on, we only consider r < p. We follow the same arguments as above. 
15^1 can be replaced by 



nf-frp< 
^ ^(^1 E E w^^i^ ~ (3,v)i',vf + II E E ^^■''^^" 

=:/ + //. 



(26) 



For // using the embedding B^ [M] C i?p,g'' '' (Af ), for r < p, we have 



^^' ^ ^1 E E ^^v^iv 

j>J neSfj 



<C2 



-j(^-T+i) 



And it is easy to verify that | — 7 + - > 2s+d '-''^ ^^"^ zone that we are 
considering in this part. In effect ass> ^{^ — -)i 2s+d — IT ^^ have 
|_i + i_f: = (i_ i)( «r - 1) > 0. 

a r p dp ^ r p ■' ^ a ' — 

For /, we have using the triangular inequality together with Holder in- 
equality, 



^1 E E (*(^^'') - ^7.)^.. < 

Then we need only to use ((24|) , with 7 = -^^z = p^ to obtain 






It is easy to realize that again 

2-^1 = 



n 2 



2M 



logn 



is adequate, and to observe that the first term in the sum has the right order. 
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For the second term, it can be bounded (as p > r) by 



3=Ji+i neS'j 

J 

j=Ji+i 

as, sr — ^(p — r) > 0. Now, this term obviously is of the right order. 

Again we proceed as above and observe first that in order to have s > 



as 



well as s < ^ ( i — i ) , it is necessary that p > 



2 V r p 



E||/-/ll^< 



< Cn E Y. ^^^P'Jv) ~ P,v)^ov\\l + II E E l^ov^^nVp ■■- (27) 

=:I + II. 



s^i + i 

3 I- ' I 



For // using the embedding, B^_y{M) C Bp^q'' " (M), for r < p, we have: 

j>jr]ear, 

1,1^ («-d(i-i)) ..„ 



And it is easy to verify that 



> 



^, since 2(s-d(i-i))>d, 



r ' p - 2(s-d(X"-i))' 

when s > -. 

For /, again, we have using the triangular inequality together with Holder 
inequality, 

4 E E (^('^^'') - (^^n)^^^ " ^ 

<c(j+iriE2''^'~'^ E ntiP,,)-Pj,w- 

j=0 rieSfj 

Then we need to use (P5|). with 7 = (i|, z = p, to obtain: 

i<c{j+ir-' E 2^''^^"'^ E i{ift.i>f*.}i/3..r[f^«]"'""*+ 

j<.h ve^j 

+ c(j+iriE2^'^'-'^ E i{ift,,i<2K*„}i/3,.r+n-^/'< 



j<j 



Ties; 



+ c(j+iri E 2''^'"'^ E i{ift„i<2.t„}i/3,„r +r.-^/2 < 

< 2C(J+ l)P"i2'^i{''(5-5)-Wn'^ + 

+ c(j+iri E 2^'^5-'' E i{ift.,i<2.*„}i/3..r+"-^/' 



i>Ji 



rje^. 
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as we are in the sparse region. It is easy to realize that now, again because 
we are in the sparse region 

2Af4ogn^ 

is adequate, and to observe then that the first term in tlie sum has the right 
order. For the second term, let us introduce 

rf(i-l) 
™ := ^5 T- 

s + - - - 

We easily observe that p— m — g,d_d > 0, and that m— r — ''l,d'_d > 
0. Then, as B^giM) C B"„7}^^ (M) 

J>j>Ji V&Sfj 

J>j>Ji ve3fj J>j>Ji 

P(s+--^) 
< Jf =+2~r 

which gives the right order. Observe that the term J (which is of logarithmic 
order) , can be avoided by choosing fh instead of m in such a way that rh > m, 
but r < in. This can be done except for the case where r = 53+3 "^here this 
logarithmic term is unavoidable. 



8 Proof of Proposition [15 



The proof of Proposition [15] relies on the following lemma: 

Lemma 16. There exist constants a^ > 0, C, c, such that, as soon as 

— L log n -I " 

o 

n\Pj,-(3j,\>v}<2exp\- [, Vz;>0, (28) 

'^ 2(cr-' + ^vc2 2 ) J 

E|4,-/3,„r<s,n-«, yq>l (29) 

Esnp\pj^-pj^\^<s'^{j + l)^n-K V g > 1 (30) 

n 

P(l/3j., -/?j.,l>fin)<C2n-6«^ y^>6a^ (31) 

Proof of the lemma ([28]) is simply Bernstein inequality, noticing that 
E{ijjr,{X,)f < ll/llocllV-jJIa < MC =: a^ 



and ||-0j^(Xi)||oo < c2 2 . The following inequality directly follows from ([28]) . 
when 2^'^ < n. 

P{|/3^^-/3j^l>4<2[e"S^+e-^^] (32) 
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pg]) follows from 



'^PJv - Pjv\' = / V^-'Filff,, - /3,„\ >v)dv< 



< / v'^-^2 



3V^^ 



e 4a 



dw <: Sqfi 2 . 



using the change of variables u = ^/nv. 

([30| also follows from ^■. take a = max{^, 2 VSdcr} 



Esup|/3,^-/3jJ9 



w9-T(sup 1/3^,, - pj^\ > v) dv 



V 



< 






Jv> 



e 4^ + e 



3%A?t- 



(ill 



,4 ^.A 



Now, if w > ^, 2J'''e~S^ < e~i^"?^+^''' < e^tS^. Similarly 2^'^e-^-^ < 



_3v^ 

e 8c so that 



Esup|/3j^-/3j^|«<- 



ni Jv>-s± 



D 
Let us now turn to the proof of the Proposition. We partition our sum 
in four regions: 



V2J''Esup |i(/3;,)-/3,,|- = 

J 

J]2^m sup W]^)-P,,V{l{\p^^\>^,^} + l{|ft,|<Kt„}} < 



< ^2^-^E sup l/3j^-/3,„ri{|^:^|>,t„}l{|ft,l>1 



i=o 



jjGiTj 



■*„} 



^2^TE sup l/3j,,-/3j>,l'l{|ft:^|>„t„}l{|/3„|<ft„} + 



{|3,-„|>2»t„} 



J=0 

J 

+ ^2^TE sup l/3,r,ri{|^:^|<,t„}l{|ft„|<2Kt„} = 

=: 55 + Bs + Sb + Ss . 

We use extensively Lemma [1^ in order to bound separately each of the 
four terms Bb, Ss, Sb, Bs. 
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Using (1201) 

J 

Bb < ^2-'"^E sup \I3„ - I3„nw„\>^u.} 

J 

^ H ^{3 near,. |/3„|>f t„}2^^IE sup |/3j^ - /3j^|^ 

J 

where Ji is chosen such that for j > Ji, |/3j^| < -lin. Also 

J 

5s<E2^"^sup|/3,„|n{|^^,,l<2.t„} 

Jl J 

<Y,^'^['2^^tnV + Y. 2^'' sup |/3,„r 
which gives the proper rate of convergence. Moreover, using ([50)) and (pij) . 

,7 

BS < E2^TE sup l/3j,, -/3jr,ri{|^^^_^^^|>~t„}l{|ft„|<tt„} 

J 

J 

< E 2-''^[E sup |/3~, - /?,,P1^P{3 r, e iT,, l/S^, - /3,,| > ^t^}^ 

J 

< E2^''[4(j + l)2^n-^]^[c2^''n^6«]5 < n"* 

where k > 5(-J + ^)- Finally, using (|3ip . and the fact that for / bounded, 

|/3,,|<C2-¥ 

J 

Sb < ^2^^ sup |/?..ri^|^^.^_^-^|>^,^jl{|^„|>2.*„} 

J 

j=o 
J 

< E[c2J"['^(l-t)+^l77-6«] 
3=0 

< C[2-'[''(i-t)+Tlr7-6'^] <7i-i, 
for «> 1(2 + 1). 
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8.1 Proof of (IID and (M 



This proof follows along the lines of the previous one. ([M| is a consequence 
of (^5]) . and the two inequalities will be proved together. We again separate 
the four cases. 

J 

<Y^2^h~d)E Y^ |/3;.-/?,.ri{|^:„|>.,„}l{|ft,|>t*„} + 

+ J22^(.-^)E Y: l/5^.-/^..ri{|ft„|>«.„}l{|ft.|<f*„} + 

j=0 rieSfj 

Let us now bound separately each of the four terms Bb, Ss, Sb, Bs. Using 
J 

j=0 -qdSSfj 

J 

< C2-'i''n"i 

where Ji again is chosen such that for j > Ji, |/3j^| < -^tn. To prove ([23l). 
we stop in (*), the next bound yields p4|) . 

5s<^2^-(^-'^)^ l/3,.ri{|ft,|<2.t„} (*) 

Ji J 



24 



which gives the proper rate of convergence. Again, to prove (P5)) . we stop in 
(*), the next bound yields ^^. Moreover, using ([SO]) and ((3T|) . 



j=o nes-j 

J 

j=o .jes-j 

,7 

j=o 

for K > ^. Finally, using ([?T|l . and the fact that for / bounded, |/3j^| < 
C2"^ 

.7 

J 

<^[c2^"[-^+^ln-6T 

< C[2-'[-*+^ln-6«] < Cn^ 
for K > 0. 



9 Proof of the lower bound 

Let us recall that given two probabilities P, Q on some measure space their 
KuUback-Leibler distance is 

r Pog^dP^Jj^log^dQ iiP«Q 
K{P, Q) = I 

y +00 otherwise . 

If P, Q are probabilities on S'^ having densities /, g respectively with respect 
to Lebesgue measure, then if g is bounded below by some constant c > 



K{P,Q) = J log'^ f dx = J \og{f^ + 1) f dx < 
< J £^ fdx = J ^-i^dx < IWf - g\\l . 



(33) 



We make use of Fano's lemma below, see Tsy04| and the references therein. 



We use the point of view introduced in |Bir01j . 
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then 



Theorem 17. (Fano 's Lemma) Let ^ he a sigm,a algebra on the space U,. Let 
Fj e ^, « G {0, 1, . . . , m} such that \/i ^ j, F,nFj = ij). Let P,, i ^ 0, . . . ,m 
be probability measures on (J7,i2/). // 

p =^ sup PdF.n, (34) 

2— 0,...,rn 

k{Po, ...,Pm) "^ inf - V K{P,, Pj) . (35) 

j=0,...,m m ^ — ' 

P>^AC(V^e-"(^'''-^")), C = e=^/". (36) 

• We prove first that the minimax L^-loss is > n~"P with a = 2s+df ^'^^ 
every j let us consider the family s^j of densities 

where Aj is a subset of ^- to be made precise later, e^ e {0, 1} and 7 is 
chosen so that all these functions are positive. We are going to show that 
for every estimator /, 

supE/J|/-/e||P>cn"'J^ . 

Throughout this section we shall note x < y, x > y whenever it holds x < cy 
or X > cy respectively, c being a strictly positive constant independent of 
J, ^. We shall note x c^ y whenever both a; < y and x > y hold. Thanks to 
(fT2|) for these functions to be positive it is enough that I7I < 2"'''^/^. Such a 
7 can even be chosen in such a way that all the densities ([37|) are bounded 
from below by a strictly positive constant. If the functions ('0jj)jg^. were 
orthonormal we would have immediately that 






A«^„c|| >c(E ikn^uK) 



i/p 



4Gir, -'' iesf, 



Needlets are not a basis, but their scalar product is close enough to if the 
respective cubature points are far enough. Hence one can get the following 
lemma that states that a subset Aj C 2fj can be chosen so that it is quite 
large and inequalities (fT4|) and (fT3|) . in a sense, can be reversed. 



Lemma 18. There exists a subset Aj C 3fj such that card Aj > 2-' and 






csup^g^^. lA^IIIV-j.^lloo ifp^ 00 



Let us now impose conditions that ensure that /^ belongs to the ball 
B^q{M). Now, recaUing ^, 



|/,||b.^ = |7|2^(^+'i(5-7))(^ kcr)'^' ^ |7|2^"(^+'^(^-^))2^-^/'' 
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where we use the fact that \e^\ = 1. Therefore the condition |l/e|l_Bs < M 
follows from 

|7| < M2-^^'+^^ . 

In order to apply Fano's Lemma and get a lower bound of the left hand side 
let us first get an upper bound for the KuUback-Leibler distances K{f^; f^), 
which comes from (|33p and (fT2)l for p = 2, 



II A - fe'Wl < 7' E l=« - 41' < 7'2-'"'^ < 2-2^-^ . (38) 

By Lemma \TE\ 

iiA-/e'iip>(Ei^«-4riiV'.-.di"''^' 

Thanks to Lemma [151 the set of functions £/j has a cardinality that is > 
2"^^ . By the Varshanov- Gilbert Lemma ( |Tsy04| e.g.) there exists a subset 
£/' C £/j such that cardjz/' > 2'^^' and such that if /g,/^ € £/', then 
J2f£A'. k? ~ £^l > i 2''''- Therefore, as |e{ — e'^\ can be = or = 1 only and 
by (HI, 

IIA-/e'|lp>l7|2^'''(^-')(^)^/^^2-^-^ 

which implies that the events {|| jf — /e'||p > | 2^-'''} are disjoint. The family 
of densities /^ G ^.' given by the Varshanov- Gilbert Lemma has cardinality 

m ~ 2''"^"' and by ^ and dMl) 

^(A,/e')<IIA-/e'|l2<2-'^'^. 

We apply now Fano's lemma to the probabilities Pg that are the n times 
product of fedx and to the events A^ = {\\f — fe'Wp > ^2~^''}. It is well 
known that 

By Markov inequality and Fano's lemma 

sup E||/ - All^ > 2-^2-^'^P sup Pedl/ - M\p >6)> 



>2-^^^{\ Ae-"2-^ y#:^) 



-2 



^2J^ 



Now let j be so that n2^2-'* ~ 2-''*, that is V ~ 7i2=.+<i . With this choice one 
has 

i A(e-"2"'^^e=2^")>c>0. 

Therefore 

sup E||/ - /eE > c2-^'"P - n-^3d . 

p(s+d(i-i)) 



• We prove now that the minimax iP-loss is > n ^("+'^(2 r" . Let us consider 
the two densities 
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with 7 such that the above are positive (I7I < 2 ■''^/^ is enough). If I7I < 
2-JS2~jrf(^-^)M, then thanks to ^ both /o and /i belong to the baU 
B^ JM). Remark that this condition imphes I7I < 2"-''^/^, as we assume 
s > f Also 

K{fodx,hdx)<\\fo-h\\l^l^ 

so that, if we denote by Pq, Pi the n-times product of /o dx and /i dx by itself 
respectively, iC(Po,Pi) ~ n-7^. By (fT3l) and Lemma [T8l we have 

ll/o - /illp = |7|||V',,« - V',,C'llp > W^'^-^^-^^ ~ 

(39) 

We choose 7 = -1= = 2-J('*+''(i-7)), so that i4:(Po,Pi) ~ n. Moreover with 
this choice of n, j « logri((2(s + (i(i — i)))^^, so that again by Fano's lemma, 

sup E||/ - hWl > 6P sup P,(||/ - Mp > 6) . 

i=l,2 i=l,2 

Thanks to ^ the events {||/ - f,\\p > 5} arc disjoint if 5 < 2^^("+''(p-^». 
Therefore by Fano's lemma 



sup E||/- /,||^ >SP> 2-J('^+'^(i-^)) ^ 






1=1,2 

We have therefore proved that sup^^^s (]^,^\ mm:'Ef{\\f — f\\P is 

> n~ 2^+3 and > n sI^+^TF^J)! _ 

Putting things together and checking for which values of the parameters one 
rate is larger than the other one concludes the proof of Theorem [TT] Note 
that, as s > -, if 1 <p < 2 



sp 



< 



p{s+da-^)) 



p 



2s + d- 2(s + d(i-i)) 
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